Web Usage Mining Framework
نویسنده
چکیده
We present a tool that alleviates the tedious work of data pre-processing and data structuring to support scientific research on web usage mining (WUM). Furthermore, the framework contains web usage mining models as well as methods for basic statistics on the structured data. The contribution of the tool is a tree-like visualization of frequent navigational sequences. This visualization allows for an easily interpretable tree-like view of patterns with highlighted relevant information. The demonstration gives an overview of the functionalities of the framework. 1 WUM framework Web servers are responsible for providing the available web content on user requests. They collect all the information on request activities into so-called log files. Log data are a rich source for web usage mining. Many scientific researches aim at the field of web usage mining and especially at user behaviour exploration. Besides, there is a great demand in the business sector for personalized, customdesigned systems that conform highly to the requirements of users. There are many commercial web access log mining tools in the market. However none of them supports scientific experimentation. We present a framework, developed within the DIANA project [1], to facilitate research on web usage mining. The framework can be used to pre-process and structure web logs and also as a test bed for web usage mining models. Figure 1 shows an overal scheme of the framework. It consists of three main parts: data preparation, session identification Figure 1: An overall scheme of the web usage mining framework and profile mining. The framework handles three sources of information: the web access log data, a mapping table to map visited pages into unique numbers and extra information about the visit and about the users. The first step is to pre-process the data and to load them into a database. In the session identification part we have the user sessions extracted from the database. We can group these sessions based on the extra information. The sessions can either be used in a separate environment as the basis of a research or they can be mined, as the third step indicates in Figure 1, by the built-in algorithms. The framework contains web usage mining models (frequent itemset and association rule mining [2], frequent sequence mining [3], etc.) as well as methods for basic statistics on the structured data. The contribution of this framework is a visualization of frequent navigational sequences. 2 Visualization of frequent navigational sequences The visualization presents a tree-like view of navigational patterns that highlights relevant information and can be interpreted easily. Figure 2 presents an example of such a tree. The tree consists of nodes with their content type related labels in specific colours. There is a special virtual node--the root of all sequences. It contains additional information (name of the tree, support rate). Nodes are connected with lines (edges) in different thickness marking the frequencies of the given paths. Edges are labelled with the percentage of (sub)sequences crossing the given node. The absolute number of sequences finished at the given node is given in parentheses. The size of the tree is controlled by the so-called support threshold. The tree contains only (sub)sessions that are more frequent than this given threshold. Figure 2: An example of tree-like visualization 3 Technology and System requirements • The framework is implemented in Java and therefore it is platform independent. • Additionally, it needs a database management system to store and manipulate data. Any system that supports JDBC like MySQL, Oracle, Sybase, etc., is sufficient.
منابع مشابه
A Framework for Personal Web Usage Mining
In this paper, we propose to mine Web usage data on client side, or personal Web usage mining, as a complement to the server side Web usage mining. By mining client side Web usage data, more complete knowledge about Web usage can be obtained. A framework for personal Web usage mining is proposed. Some related issues and applications of personal Web usage mining
متن کاملA Data Warehouse/OLAP Framework for Web Usage Mining and Business Intelligence Reporting
Web usage mining is the application of data mining techniques to discover usage patterns and behaviors from web data (clickstream, purchase information, customer information etc) in order to understand and serve e-commerce customers better and improve the online business. In this paper we present a general Data Warehouse/OLAP framework for web usage mining and business intelligence reporting. W...
متن کاملResearch on Web Usage Mining for Electronic Commerce
Web usage mining is the application of data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. On the other hand, the rapid growth of e-commerce has caused product overload where customers on the Web are no longer able to effectively choose the products they are exposed to. Applying Web usage mining techniques...
متن کاملA data warehouse/online analytic processing framework for web usage mining and business intelligence reporting
Web usage mining is the application of data mining techniques to discover usage patterns and behaviors from web data (clickstream, purchase information, customer information, etc.) in order to understand and serve e-commerce customers better and improve the online business. In this article, we present a general data warehouse/online analytic processing (OLAP) framework for web usage mining and ...
متن کاملA Framework for Web Usage Mining in Electronic Government
Web usage mining has been a major component of management strategy to enhance organizational analysis and decision. The literature on Web usage mining that deals with strategies and technologies for effectively employing Web usage mining is quite vast. In recent years, E-government has received much attention from researchers and practitioners. Huge amounts of user access data are produced in E...
متن کاملEfficient and Anonymous Web-Usage Mining for Web Personalization
The World Wide Web (WWW) is the largest distributed information space and has grown to encompass diverse information resources. Although the web is growing exponentially, the individual’s capacity to read and digest content is essentially fixed. The full economic potential of the web will not be realized unless enabling technologies are provided to facilitate access to web resources. Currently ...
متن کامل